prolfquapp - Streamlining Protein Differential Expression Analysis in Core Facilities

Witold Wolski1; 2, , Bernd Roshitzki1, , Jonas Grossmann1; 2, , Claudia Fortes1, Paolo Nanni1, , Christian Panse1; 2, , Ralph Schlapbach1,

1 Functional Genomics Center Zurich - ETH Zurich/University of Zurich (https://www.fgcz.ch/); 2 Swiss Institute of Bioinformatics (https://www.sib.swiss/)
  • Protein differential expression analysis (DEA)
    • DIANN
    • FragPipe DDA
    • FragPipe TMT
    • MaxQuant
  • Uses preprocessing and statistical models implemented in the R package prolfqua
    doi.org/10.1021/acs.jproteome.2c00441
  • Generates dynamic HTML reports
  • Exports results as XLSX files, .rnk and .txt files for GSEA and ORA

How To

Install R and prolfquapp

install.packages('remotes')
remotes::install_github('wolski/prolfquapp', dependencies = TRUE)

Create a directory with :

  • config.yaml (parameter file)
  • dataset.csv (experimental design)
  • the FASTA file
  • DIANN, FragPipe or MaxQuant results

Copy the R code into the working directory by running one of the functions:

The content of the working directory is:

Finally, from R console source("FP_DIA.R"), or execute Rscript FP_DIA.R. This creates a subfolder with the DEA results.

  • DE_Groups_vs_Controls.html report describing the main steps of the analysis and shows the results.
  • DE_Groups_vs_Controls.xlsx contains the raw and transformed abundances, annotations, results of the differential expression analysis.
  • .rnk, and .txt files for GSEA and ORA analysis

The entire working directory is archived. It contains all the data and R code and data to replicate the analysis.

Analysis parameters

The config.yaml file specifies the parameters of the analysis:

  • project related information e.g. projectID, is shown in the HTML report
  • aggregation method
    (medpolish, rlm, top_3)
  • abundance transformation
    (robscale, vsn, none),
  • FDR and effect size thresholds


Sample annotation

The dataset.csv file contains the information about the measured samples:

  • Relative.Path/raw.file/channel (unique)
  • name - used in plots and figures (unique)
  • group - main factor
  • subject/bioreplicate (optional) - blocking factor
  • control - used to specify the control condition (C) (optional)

If subject is specified then the model is abundance ~ group + subject, otherwise abundance ~ group. The group differences to compute are determined from the group column and the control column.

HTML report

  • Project related information (project ID etc)
  • Primary introduction to DEA
  • Sums up the design of the experiment
  • Summarizes of protein ident. and quant.:
    missigness, CV, clustering, PCA
  • DEA results with volcano plots and tables (they interact using crosslink)
  • Explains outputs, give pointers to GSEA and ORA
  • Additional QC report

Conlusion

  • Integrates into LIMS system
    doi.org/10.1515/jib-2022-0031
  • Archived directory contains all information needed to replicate analysis
    rerun the analysis on your PC
  • Our users know Excel and like XLSX files
  • Shiny app in development